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Abstract 

We develop improved rearrangement algorithms to find the dependence structure that min¬ 
imizes a convex function of the sum of dependent variables with given margins. We propose a 
new multivariate dependence measure, which can assess the convergence of the rearrangement 
algorithms and can be used as a stopping rule. We show how to apply these algorithms for 
example to finding the dependence among variables for which the marginal distributions and 
the distribution of the sum or the difference are known. As an example, we can find the depen¬ 
dence between two uniformly distributed variables that makes the distribution of the sum of 
two uniform variables indistinguishable from a normal distribution. Using MCMC techniques, 
we design an algorithm that converges to the global optimum. 
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Algorithms for Finding Copulas 


Minimizing Convex Functions of Sums 


Introduction 


For specified marginal distributions such as the uniform or the normal distribution, can we find 
a dependence structure or copula, which provides a specific distribution for the sum of n vari¬ 
ables? What if we were to require that the sum be constant? Questions like this have been 
address ed theoretically in the l it erature in a number of pap er s with the concept of co mplete mix- 
ability ( Wang and Wand ( 201 il l. Puccetti and Wang ( 2015bl ). Wang and Wang ( 2016) to cite only 
a few) , and computationally, with the rearrangemen t algorithm (RA) (jPuccetti and Riischendorf 
( 20121 ) . Embrechts. Puccetti. and Riischendorf ( 2013 )'). The RA aims to minimize the expectation 
of a convex function of a sum of random variables (including the case of minimization of the vari¬ 
ance of the sum as a special case)Q This algorithm is fast and simple but may not converge to 
the global minimum. In particular, it does not depend on the convex function to minimize. Our 
main objective in this paper is to further this discussion by developing an improved version of this 
algorithm. 

The minimization of convex functions of a sum of dependent random variables can be formulated 
using a matrix X := (X l j )i j. and is linked to the problem of minimizing a convex function of the row 
sums Xij over all permutations within the columns. It is a highly computationally co mplex 

proble m, as even in the special case of n = 3 columns, it has been shown to be NP-complete (Haul 
(2015)). It means that no algorithm will guarantee convergence to the optimum in polynomial time. 
Enumeration might be considered, but for reasonable sized matrices this is also not feasible. For 
a m x n matrix, the number of essentially distinct matrices that can be obtained is the number of 
permutations of the columns other than the first, which is (m!) n_1 . For example, for a small 10 x 6 
matrix this is (10!) 5 = 6.29 2 4 x 10 32 , obviously completely impossible by enumeration. Various 
versions of this problem have been treated in the l itera t ure, a nd algorithms proposed for special 
cases, but because the problem is NP complete (see Hsu ( 1984 1)). these algorithms do not converge 
to an optimal. For example Coffman and Yannakakisl ( 19841 ) propose an algorithm designed to 
minimize the maximum of the row sums (the assembly line crew scheduling problem) and show 
that this algorithm converges to a solution less than 1.5 times the optimal. The NP complete nature 
of the problem indicates that the worst case performance of algorithms may be unsatisfactory, but 
it is still possible to develop algorithms, which normally find the optimum very quickly. We present 
Markov Chain algorithms here, which guarantee finding the global optimum in finite time, usually 


1 More details on the RA can be found at https://sites.google.com/site/rearrangementalgorithm/ 
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very rapidly. 


There are many more applications for the Rearrangement Algorithm (RA) of Puccett i and Rtischendo rl 
( 2012 ). It has been used successfully in recent advances to the risk management held. Specifi¬ 
cally, the RA is used to measure model risk on dependence, also called “dependence uncertainty,” 
and can help regulators to m ake decisions on w h ich risk measure is most appropri a te to com¬ 
pute capital requirements (see Embrechts. Puccetti. Riischendorf, Wang, and Belerai ( 2014l U. It 
was successfully used to appr oximate VaR bounds on the sum of n de pendent risks with given 
marginal distributions by [Embrechts. Puccetti. and Ruschendorl ( 2013h by applying the RA to 
the largest rows of the matrix such that all risks are comonotonic. Ma ny more applications of 
this RA have been recently developed. Among othe rs, Aas and Puccetti! 12014 1 use t he st andar d 

RA to compute capital requirements of DNB ba nk, iBernard. Riischendorf. and Vanduffell (120161). _ 

Bernard. Riischendorf. Vanduffel. and Yaol ( 201fil ) to assess portfolios’ credit risk and Bernard and Vanduffel 


(1201 ol ) ;o incorp orate partial information on depe ndence in the computation of bounds on capi 
tal requirements. IBernard. Jiang, and Wand (1201411 to der i ve boun ds on convex risk measures and 
quantify dependence uncertainty and P uccett i and Wan g (2015aj) to detect complete mixability. 
As we expect many more applications of such algorithms, it is important to develop efficient and 
accurate algorithms that converge to a minimum, a feature of our algorithms. 


In this paper, we develop a novel application. We are able to find the dependence structure such 
that the sum of dependent variables has a prescribed distribution. We will refer to two distributions 
F and G as “close” if a large sample from one, say from G, cannot be detected as not coming from 
F with high probability using a standard test. We will use the Kolmogorov-Smirnov test and the 
Wasserstein distance test. This allows us, for example, to find two dependent Uniform[0,1] random 
variables Ui such that U\ + U 2 — Z mr7 2 is nearly 0 where Z ma 2 is a normally distributed variable 
with mean m and variance cr 2 , in other words so that U\ + U 2 is nearly normal. 


This toy example illustrates a potential use of the methodology to infer the dependence among 
variables that can explain a given distribution for the sum. At first, it may sound limited and 
more a mathematical curiosity but this methodology can also be useful in practice. For example, 
it can be used in finance. Assuming that prices of basket options or spread options are available 
at the same time as prices of regular options written on individual stocks, our methodology can 
then be useful to infer a multivariate model for the assets that is consistent with this information. 
It is then particularly interesting in fitting the bivariate distributio n between gas and electric ¬ 
ity prices given that spark spre a d options are acti vely traded (see lAlexander and Scoursel (j2004r ) , 
Carmona and Durrleman ( 2f)03h . Rosenberg! ( 20f)fll U. 


The paper is organized as follows. In Section[T] we pr esent a new mult ivariat e measure inspired by 
the notion of E-countermonotonicity discussed by Puccetti and Wangl (2015b). Then, in Section [2], 
we develop improved rearrangement algorithms that may use this multivariate dependence measure 
as a stopping rule and discuss their relative performance. Our algorithms can converge in fewer 
steps by selecting the blocks optimally and converge to a point much closer to the global optimum 
than the standard RA, often by orders of magnitude. Section [3] illustrates the methodology with the 
explicit construction of the dependence that makes the sum of two uniformly distributed variables 
indistinguishable from a normal distribution. More generally, we are interested in whether a copula 
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exists such that the sum of m random variables from one distribution has another prescribed 
distribution. We then briefly discuss an application to finance. Finally, in Section [H we show how 
to modify the block RA using Markov Chain Monte Carlo methods to guarantee finding the global 
minimum in finite time and accounting for a given convex measure of the sum. 


1 A new multivariate measure of dependence 


In this section, we propose to extend any dependence measure defined between two random variables 
to a multivariate dependence measure in a natural way. 


1.1 A new multivariate measure based on E-countermonotonicity 


This multivariate measure will play a crucial role in assessing the convergence of the rearrange- 
ment algorithm that minimizes th e var iance of the sum of dependent risks wi t h give n marginals 
( Puccetti and Riischendorl ( 2012h and Embrechts. Puccetti. and Riischendorl ( 2013 b. It is in¬ 
spired by the recent notion of E-countermonotonicity introduced by Puccetti and Wang ( 20151)1 ) in 
which all sums over di sjoint subsets II and LI such that II U LI = {1, 2,..., n} are countermonotonic 
(see also Lee and Ahnl ( 20141 )). 


Definition 1.1. Let </>(Xi,X 2 ) be a measure of dependence between two columns of data Xi and 
Xt such as Spearman’s rho, Kendall’s tau, or Pearson correlation coefficient. For a matrix of data 
X = [Xi, X 2 ,..., X n _i, X n ] with n columns, we define the multivariate measure of dependence 


<?( x ) := 2 n-i_i E ^ 
ucv 

where the sum is over the set V consisting of 2 n ~ 1 — 
empty subsets II and its complement Sq 


E x -E x < 

d en *en 


(1) 


1 distinct partitions of {1, 2,..., n} into non- 


For the remainder of the paper, we assume that 4> is the Spearman correlation. Let us recall its 
definition for two continuous random variables X and Y with respective marginal c.d.f. Fx and 
Fy- The Spearman correlation is then equal to 


<KX,Y) 


co v(F x (X),F y (Y)) 

V / var(Fx(X))var(Fy (Y)) ’ 


( 2 ) 


which corresponds to the correlation between the two uniformly distributed generators of X and Y 
respectively. The results using alternatives such as Kendall’s tau would be similar. The minimum 

2 There are 2 n partitions (II, fl) so that (1, 2,..., n} = null and linn = 0. But the measure <j> (X)ign Xi, E,gn Xi) 
is usually meaningless when either II or S are empty. Moreover the partition (II , II) is essentially counted twice. So 
there are ( 2 n — 2)/2 relevant “distinct” partitions into non-empty subsets. 
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Spearman correlation is -1 and it is achieve d by the countermonotonicity s t ructu re (originally called 
“antithetic” dependence in the language of Hammerslev and Handscomb (1964])). 


This measure g(X) is different from the multivariate Kendall’s tau, multivariate Spearman cor¬ 
relation, the aver age pairwise Kendall’s tau, or the average pairwise Spearman correlation recalled 
in Definition 8 of Lee and Ahn ( 2014h . In (jT]) , we average the bivariate dependence measure be¬ 
tween two sums taken over the two subsets II and II of a partition of {1, 2,..., n}, i.e. two disjoint 
non-empty sets II and II with II U LI = {1,2, ...,n}. Contrary to existing multivariate dependence 
measures, it is not driven by the dependence pairwise. In addition, it has a nice connection with 
convex order as outlined in Bemark 11.21 hereafter. 


This n-dimensional dependence measure, g(X), can be unbiasedly estimated either by choosing 
some of the 2 n_1 — 1 such partitions without replacement or by assigning columns at random, e.g. 
put 

S n = X 1 +X 2 + ... + X n _ 1 + X n 

and average the values of 

( n n \ 

E T;X 8 , S n - ^ Ij X,; j (3) 

i =1 i =1 / 

over many samples of random independent Bernoulli variables R for which 0 < £7=1 U < n. 

Remark 1.1. As a side remark, we give the continuous formulation of our newly proposed multi¬ 
variate risk measure g. Starting from the definition of the Spearman correlation in ([2]), and using 
the moments of a uniformly distributed variable, var(Tx(X)) = and E(Fx{X)) = 


i ri 


<t>{X,Y) = 12 



P(F x (X) > x,F y (Y) > y)dxdy - 3 


o J o 


(see for example iNels en (j2006j)). Therefore, with Sn = £j gn we define the multivariate measure 
of dependence related to the joint distribution of a random vector X of dimension n, 


1 


e(X) := y!-!,! E^( 5 n, 5 n) 


ngp 


= 12 


o Jo 


2 n—1 - 1 


E P{Fsu(Sn)>x,F Sn {S n )>y ) 


nev 


dxdy — 3 


This can be estimated unbiasedly by choosing one or more partitions II and II at random in the 
set V of all possible 2 n_1 — 1 partitions and corresponding uniformly distributed random numbers 
U,V U[ 0,1] and using 12 times the proportion of times that Tg n (Sn) > U and Fs n (Sf j) > V 
minus 3. 


1.2 Necessary condition to minimize convex functions of a sum 


It has been noted in Puccetti and B.uschendorf 12012 1 that the situation in which all the columns 
are countermonotonic with the sum of all others is a necessary condition to have a dependence 
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structure that minimizes the expectation of a convex function of a sum. Proposition 11.11 below is 
a straightforward extension. The result holds for the minimization of any expectation of a convex 
function and as a special case for the variance of the sum var(^ Xf). We provide a counterexample 
to show that the condition is not sufficient. 

Proposition 1.1 (Necessary condition to minimize expected convex functions of a sum). 

Let f be a convex function. If E(/(XOj X,,)) is at a minimum then 4> (X^ien^ 0 Siefl^i) 
minimized for every partition into two sets II and ft. However, the converse does not hold in 
general. 


Proof. The sufficient condition is proved in Theorem 3.8 (d) of Puccetti and Wane ( 2015bh . The 
other direction is unfortunately false. For example, consider the matrix below: 


B 1 


/ 0.0662 
0.3271 
0.6524 

V 1.0826 


0.2571 

1.0061 

-0.6509 

-0.9444 


0 

-1.3218 

-0.0549 

0.9248 


-0.5842 \ 
-0.0833 
0.2495 
-0.9263 / 


(4) 


It is straightforward to check (with basic calculations) that for all 7 possible partitions II, ft we 
have that Sfi, Sg are countermonotonic so that cf) (X^gn A; Sien A) = — 1 for all such partitions. 
The variance of the row sums is 0.04346. However, the matrix 


B 2 


/ 0.0662 
0.3271 
0.6524 
\ 1.0826 


1.0061 

0.2571 

-0.6509 

-0.9444 


-1.3218 

0 

0.9248 

-0.0549 


0.2495 \ 
-0.5842 
-0.9263 
-0.0833 / 


(5) 


obtained by a slightly different permutation of the columns provides constant (= 0) row sums with 
a strictly smaller value of the variance of the row sums (as the variance is then equal to zero). It 
is a counterexample for the expectation of any convex function and not just for the variance. □ 

Remark 1.2. Recall that the Spearman correlation cf between Su and 5g is minimized with the 
value -1 achieved by the countermonotonicity between the pair of sums Sri and Sg. Therefore, 
Proposition 1 1.1 1 shows that a necessary condition to attain a dependence between X, that minimizes 
the expectation of a convex function and thus the variance is that 


£?(X) = -1. 


Note also that Proposition 11.11 can be applied more generally to supermodular functions and 
convex functions of a sum are only special cases. 


2 Improved Rearrangement Algorithms 


In this secti on, we start by reca lling the standard rearrang ement algorithm (RA) of Puccetti and Ruschendorf 
f 2012h and Embrechts. Puccetti. and Ruschendorf (2013!). We then show how to improve it by de¬ 
signing the Block RA. We then illustrate the improvement through some numerical examples. To 
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facilitate the exposition in this section, we develop algorithms aimed at minimizing the variance of 
the sum. In Section [4] we will show how to adapt these algorithms to ensure convergence to the 
global minimum of the expectation of a specific convex function of the sum that is not necessarily 
the variance. 


2.1 Standard Rearrangement Algorithm 

The standard rearrangement algorithm is a method of constructing dependence between variables 
Xj (j = 1,2,, n ) such that the variance of the sum S n becomes as small as possible. Consider 
a matrix X = \xij\i,j, corresponding to a multivariate vector [Xi, X 2 ,..., X n ], 


Standard Rearrangement Algorithm For i from 1 to n, make the i th column countermono¬ 
tonic with the sum of the other columns. Repeat this process (by starting again from the first 
column) until each column is countermonotonic with the sum of the other columns. 

At each step of this algorithm, we make the j th column countermonotonic with the sum Aj, 
so that the variance of the sum of all columns before rearranging is larger than the variance of the 
sum of all columns after rearranging. At each step of the algorithm the variance decreases, it is 
bounded from below (by 0) and thus converges (given that there is a finite number of permutations 
of rows and columns). If it gets to 0, we have found a perfect mixability situation in which the 
dependence makes the sum constant. Otherwise, there is no guarantee that we have found the 
global minimum of the variance of the sum over all dependence structures. 


We note however that it is possible to converge to a matrix X for which £>(X) > — 1 and therefore, 
that does not satisfy the necessary condition of Proposition ll.il For example, consider the following 
matrix 


1.1423 

0.3674 

1.8266 

2.1637 

1.9135 

0.9880 

0.5237 

2.0392 

2.8994 

0.0377 

1.5924 

1.0061 

4.0077 

0.8852 

0.1974 

0.4097 


The matrix C is such that Sn, and Sfj are countermonotonic whenever II = {i}. In this case, the 
multivariate dependence measure is g(X.) = —0.9714 because certain subsets are not countermono¬ 
tonic, in particular 5n and Sjj with II = {1,3} and {2,3}. For this matrix, the standard RA has 
already declared convergence. 


2.2 Block Rearrangement Algorithm 

We now construct a version of the rearrangement algorithm designed to reduce the measure g(X.) 
in order to improve the convergence to the minimum variance. Suppose for each partition II € V 
we know the values of pn = 4> (XXen X-i, X^ e n X?) • hi order to reduce the variance of the sum, we 
wish to reduce the covariances cov (X^en X-i, Xj) and in particular, rearrange so as to reduce 
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the largest of these values. We will therefore apply a rearrangement of the elements of X,;, i £ II 
so that the sums Xlien x i are countermonotonic to X i- 

Suppose that the matrix X = [Xi, X 2 , ...,X n _i,X n ] has covariance matrix S. Note that Sn = 
Eieri x -i and so 

var X^ = var(Sn) + var(Sq) + 2cov(Sn, Sq) 

This consists of the sum of three classes of elements of the covariance matrix: 

(a) the sum of for both i,j£ II 

(b) the sum of for both i.j fS 

(c) the sum of Xjj for i £ H,j £ S. 

An algorithm which proceeds at each step by keeping the values of var(Sn), var(S'n) constant 
while minimizing the value of cov(Sn,Sn) over rearrangements of the blocks, is bound to result 
in a non-increasing variance and will therefore converge. In order to obtain a maximum benefit 
from this single rearrangement, we wish to choose a subset II for which the Spearman correlation 
<fi (S'n, Sq) is the largest and then rearrange the second block so that Sg is countermonotonic to 
the values of S'n, thereby rendering 0(Sn,Sn) = —1. Since var(Sn), var(Sjj) are unchanged and 
cov(Sn,Sn) is reduced, this results in a reduction of var(^NXj). It turns out that choosing the 
largest Spearman correlation ^(Sn,Sf[) among a relatively small number of possible partitions 
speeds up the algorithm and is adequate. For a matrix X with n columns, there are p := 2 n_1 — 1 
possible subsets of II C {1,2,3, ...,n} such that fl is non-empty so there are p possible partitions 
in V. In our algorithm, at each stage we choose to compare </>(Sn,Sn) over min(p, 512) different 
partitions {11,11}, chosen at random from this set of p possible partitions. 


Block Rearrangement Algorithm (Block RA1) 

1. Select a random sample of n S i m possible partitions of the columns {1, 2,..., n} into non-empty 
subsets {II, n}. Note if n S i m = 2 n ~ 1 — 1, all partitions are considered. 

2. For each of the above partitions compute pu = </>(Sn, S n ) • Identify the partition with the 
largest value of pn- 

3. Rearrange the second block so that Sfj is countermonotonic to the values of Sn- 

4. Compute the value of u(X) = i -,„_ 1 1 _ 1 neV $ (^ n ’ *^n) 

5. ii p(X) > —1, return to step 1. Otherwise, output the current matrix X. 

Remark 2.1. In the selection of a candidate partition (in step 2 above), Pearson correlation can 
be used in place of Spearman correlation. On the one hand, there is a significant computational 

3 This condition can be replaced by a number close to -1 such as -0.9999. 
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advantage because Pearson correlation between all possible partitions is a function of the covariance 
matrix, which can be computed once only. On the other hand, the effect of the RA on the Spearman 
correlation is very clear as it replaces it by -1 after the algorithm is applied, whereas the effect of 
Pearson correlation on the variance of the sum cannot be easily predicted before running the RA. 
Using Pearson correlation is more appropriate for large matrices. 


The example of matrix C given in d6|) shows that the block rearrangement algorithm is more 
likely to identify a dependence structure that minimizes the variance since the standard RA may 
converge to a matrix X such that g(X) ^ — 1, whereas the block RA presented above ensures that 
the resulting matrix is such that g(X) is —1. However, the contraposive of Proposition 11.11 is not 
true, thus there are situations for which g (X) = — 1, and thus <p QXgn * s minimized 

for every partition in two sets and the variance is not minimized. That is, we find a local minimum 
for the block RA presented above. Consider for example the matrices A\ and A 2 : 


/ 0.0662 

-0.9444 

0 

-0.5842 \ 

0.6524 

1.0061 

-0.0549 

0.2495 

0.3271 

-0.6509 

-1.3218 

-0.0833 

V 1.0826 

0.2571 

0.9248 

-0.9263 

/ 0.0662 

-0.9444 

0 

-0.5842 \ 

0.6524 

-0.6509 

-0.0549 

0.2495 

0.3271 

1.0061 

-1.3218 

-0.0833 

V 1.0826 

0.2571 

0.9248 

-0.9263 / 


Applying the block RAs described above to these initial matrices A\ and A 2 (with a stopping 
rule of g(X) = —1), results in convergence to two different matrices B\ and B 2 given by (TT|) 
and ([5]) with different row sums having variances 0.04346, and 0 respectively, and multivariate 
dependence measure g(B\) = g{B^) = — 1. For the various possible permutations of the columns 
of the matrix A \, there is a number of possible limit matrices or local minima, with variance of the 
row sums equal to 0,0.0049,0.0151,0.0217, and 0.0435 and over one third of the possible starting 
permutations (27 of 72) lead to limits that do not minimize the variance of the row sums. For 
small matrices this appears to be the rule rather than the exception. For example, for randomly 
generated 4x4 matrices with independent A7(0,1) distributed elements, the vast majority (more 
than 80%) appear to possess multiple local minima, in many cases five or more as in the example 
above. It should not be surprising that there may be several local minima, since this is a discrete 
optimization problem, less smooth when there is a small number of rows. Moreover the order in 
which the partitions are selected may effect which local minimum convergence is to. If the global 
minimum is required, then we can begin with a number of different starting configurations, and 
also rely on the randomness of the Block RA2 and see whether convergence is to a common point. 
Note that this algorithm can be applied instead to a subset of the rows of X, but when o(X) = —1, 
no further improvement is possible even on a subset of the rows. 


The block RA wi l l, in a fi nite amount of time , end up with a S-countermonotonic structure 
( Puccetti and Wane ( 2015b ): Lee and Ahn ( 20141 )) in which all sums over n and n are counter- 
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monotonic. To avoid the computationally expensive calculation of g(X) and pu for each II, we 
have the following variation on the Block RA that we will use throughout our examples. 


Block Rearrangement Algorithm 2 (Block RA2) 

1. Select a random sample of n S i m = min(512,2 n_1 — 1) possible partitions of the columns 
{1,2, ...,n} into non-empty subsets {II, 11}. Note if n S i m = 2 n_1 — 1, all partitions are con¬ 
sidered. 

2. For each of the above partitions, rearrange the second block so that iSjj is countermonotonic 
to the values of An- 

3. If there is no improvement in var(£T X,;), output the current matrix X, otherwise return to 
step 1. 

Remark 2.2. Note that the choice of n S i m governs a trade-off between complexity of one step 
of the algorithm and the number of steps required for eventual convergence. Regardless of the 
value of n s i m , the algorithms converge to the same set of possible local minima but the value of 
n s im governs the speed of that convergence. The relationship between the speed of convergence 
and n s i m is complicated since it depends on the values in the matrix X and the current set of 
correlations {(f> (5n, Sff) ;II € V}. Of course the computational speed also depends on the size of 
the matrix, which affects the time required to calculate the set of correlations {4> (Sn, 5'n)! H € V}. 
The ideal choice of n s i m from a computational point of view in a particular problem may have 
to be determined experimentally but theoretically, as mentioned above, the same set of candidate 
minima result from any choice of n S i m > 1. 


2.3 Comparison of performance of the RA and Block RA 

In this section, we compare the performance of the RA and BRA in achieving the global minimum 
or in approximating it. When there are three variables (n = 3), the RA and the BRA are equivalent 
as all blocks from the Block RA correspond to 1 column in one block and 2 columns in the other 
block. Therefore, there is no reduction in variance for n = 3. In what follows, we concentrate 
ourselves to cases when n > 4. 

When there is a small number of columns (for example n < 15), we are able to do a block RA 
taking all possible partitions into two blocks ( n s i m = 2 14 — 1 = 16,383), with the multivariate 
correlation g computed exactly and, on termination, equal to -1. 

For small matrices (less than 10 rows and 4 columns), we can determine the global minimum by 
trying every permutation of the columnsQ We then run the RA and the BRA to test whether they 
reach the global minimum, and if they do not, then we compute how far they are from this global 
minimum. Specifically, we repeat 10,000 times the following experiment: 

4 It is also possible to use a linear programming solver to solve for the global minimum. It could typically handle 
slightly larger matrices. 
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• Initialize the matrix X by simulating m independent Uniform[0,1] for the first column and 
then placing random permutations of these same values in the remaining n — 1 columns. 

• If m < 10 rows and n < 4 columns, permute columns 2,3 — 1 in all (m!) n ~ 2 ways, 
and arranging column n so that it is countermonotonic with the sum of the other columns. 
Among all these configurations, find the matrix X* whose row sums have the global minimum 
variance V*. 

• Apply the standard RA to X to obtain a local minimum X ra in which all columns are 
countermonotonic to the sum of the others. Compute the variance V ra of the row sums of 

X ra . 

• Apply the block RA, BRA2, to X ra to obtain the matrix X/, ra and the variance I4 ra of Ike 
row sums of Xf, m . 


How much does the BRA improve upon the RA? 

In order to compare the RA and the BRA, we compute the average value for V ra and for 14 m . 
The results are reported in Table [T] 

Table 1: Average variance for the RA and for the Block RA. Both averages are estimated with 
10,000 experiments as described above for different values of n and m. All digits reported in the 
table are significant. 


average of 

n 

Vra 

= 4 

Vbra 

n 

V ra 

= 7 

Vbra 

n 

Vra 

= 10 

Vbra 

m = 10 

0.001 

0.0006 

0.0004 

l.lxlO -5 

0.00018 

1.8x 10~ 7 

m = 100 

1.2 xl0~ 5 

5.5 xl0~ 6 

3.4x 10 -6 

8xl0 -8 

1.6xl0~ 6 

1.3xl0" 9 

m = 1,000 

1.2xl0 -7 

5.5x 10~ 8 

3.2xl0 -8 

7.6xlO -10 

1.6xl0 -8 

1.2xl0 -11 


We make the following observation on Table [lj The larger the number of variables n or the 
number of discretization steps m, the larger the improvement of the Block RA over the RA. We 
have performed other experiments with other distributions and we obtain similar results. 


Convergence of the RA and BRA algorithms to the global minimum variance 

Both the RA and the Block RA2 terminate because they are based on the variance of the row 
sums that decreases strictly at each step and is bounded from below by 0, and because there is 
a finite number of permutations, hence a finite number of values of this quantity. However, as 
we have shown, it is possible to end up at a local minimum of the variance instead of the global 
minimum. 

In Figure [TJ we plot the percentage of cases in which V ra , resp. I4 m is within a given tolerancfH 
of V*. It shows that this percentage decreases quickly to 0 as ra increases. Table [2] reports the 

5 10 -6 in this case. 
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averages of the difference V ra — V* and I4 m ~ P* ■ We find that the Block RA outperforms the RA 
by several orders of magnitude. 

Table 2: Average distance from the minimum for the RA and the BRA for n = 4 variables. 



m = 4 

m = 5 

m = 6 

m = 7 

RA 

0.0020 

0.0015 

0.0026 

0.0015 

BRA 

0.0001 

0.0002 

0.0003 

0.0003 



Figure 1: Percentage of cases that the minimum from the algorithms RA or BRA get to the global 
minimum (within 10 -6 ) 

Remark 2.3. The comparison in Table [2] is necessarily done with very small matrices as the global 
minimum V* is computed by computing the variance in all possible permutations of the matrix. 
For larger matrices, such a technique cannot be used. In fact, there are very few case s for which 
we know the value of the minimum. One option is to use the result of Haus (12015) that gives 
the minimum variance in the case of a matrix m by n that contains in each column the integers 
1,2,..., m. But this is a very specific case. In this case (at the minimum variance matrix), the mean 
of the sum is /i := n 1 +‘ 2 +--+ m and the sum takes two values M = [fx\ with probability q := n — |_/u.J 
and M+ 1 with probability (1 — q) so that the minimum global variance can be computed explicitly. 
We are then able to check the percentage of the time the RA, respectively the BRA, achieves the 
global minimum by starting from a randomized matrix (where each column has been randomly 
permuted). We obtain similar conclusions as in Figure [1] and Table O namely the percentage of 
the cases in which one achieves the global minimum decreases with n and m and the BRA is closer 
to the global minimum by several order of magnitude. 


In order to assess the convergence of the algorithm with larger matrices in a more general setting, 
we propose to generate matrices that all have constant row sums so that the variance of the row 
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sums is zero. We then randomly rearrange the values in each column, and then the RA and block 
RA2 can be applied to the matrix to see to what extent the minimum variance is achieved. For 
instance, we can generate a matrix of A/"(0,1) random variables with constant row sums as follows. 
First, generate independent Af(0, 1) random variables, then subtract the row mean from each row 
so that the sum is now 0. Lastly, multiply by the factor in order to return the marginals to 

A(0,1). 

Applying the Block RA2 with this matrix of A7(0,1) variables, we obtain the results in Table 
[3j Neither the RA nor the BRA guaranteed achieving the minimum possible variance in the 
simulations because of the complexity of the problem. Note, however that the BRA is between 10 
and 40 times closer to the optimum than is the RA for n > 5. These two conclusions are consistent 
with the previous examples. 


Table 3: Average distance from the minimum for the RA and the BRA with 10,000 simulations 
with m = 10. 



n = 4 

n = 5 

n = 6 

n = 7 

n = 8 

RA 

0.02 

0.001 

0.007 

0.005 

0.004 

BRA 

0.005 

0.0009 

0.0003 

0.0001 

0.00009 


The RA and Block RA have been developed to minimize the expectation of a convex fu nction 
of the sum of dependent random variables with given marginal distributions. As shown in Hausl 
( 20151 ), checking the complete mixability condition is a NP-complete problem (even in the case of 3 
variables only), and therefore there exists no algorithm with polynomial complexity that converges 
to the global minimum with certainty. Neither the RA nor the Block RA guarantees convergence 
to the global minimum. Furthermore, our counterexample © and © in Section Q] shows that the 
block RA may end up in a strict local minimum with a positive variance while the global minimum 
for the variance is equal to 0. Nevertheless the Block RA seems to approximate the global minimum 
to a reasonable degree of precision for large matrices. 


3 Application to finding the dependence to get a target distribu¬ 
tion for the sum 


As discussed in the introduction, the Rearrangement Algorithm has been widely used in finance 
and risk management. In this section, we discuss a new application as to infer the joint distribution 
among variables for which the distribution of the sum, the difference, or a weighted sum is known. 
We first illustrate the methodology with sums of normal or uniform variables. Next, we discuss a 
real-world application with the example of spread options. 
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3.1 Indistinguishability 


We base our analysis on two goodness of fit test statistics. The first one is the Kolmogorov-Smirnov 
(KS) test. It is fully non-parametric and applies to all target distributions. The second one is less 
well-known but based on a more appropriate measure of distance in our context, the L 2 -Wasserstein 
distance measure. The KS test is based on the following results. Suppose F m is the empirical c.d.f. 
from a sample of size m with true distribution F. Define D m = sup x \F m (x) — F(x)\, then the 
asymptotic distribution of y/mD m is well-known @ We may use this asymptotic result to determine 
the median of the distribution for large m or use simulations to approximate this value for finite 
m. For example, when m = 10 6 , using simulations, we obtain that the median of D m , medF{D m ), 
is approximately equal to 8.2 xlCH 4 so that any distribution within a region F(x ) ±8.2 x 10 -4 will 
fall in a pointwise 50% confidence interval around F. This is a very strong result as it implies for 
example that it falls in all standard (e.g., 95%, 99%) confidence intervals. 

If G(x) falls in such an interval based on a sample of m = 10 6 observations, it is usually 
indistinguishable from the target cdf F. So for the purpose of this paper, we define: 

Definition 3.1. G(x) is empirically KS-indistinguishable from F(x) with a sample size ofm = 10 6 , 

if 

sup | G(x) — F{x) | < medF{D m ) (7) 


The KS test and therefore Definition 13.11 applies to any cdf F. Of course, other test statistics 
might also be used with empirical data to determine the fit of the normal distribution. Observe also 
that the KS test is based on the distance between the cdfs, using the distance D m defined above. 
But the test statistic most consistent with the rearrangement algorithm is the L 2 -Wasserstein 
metric, which measures the squared L 2 distance between the quantile functions (see for example 
Krauczi ( 20091 )). fg |G _1 (a) — F~ l (u)\ 2 du. 


Let us define the following distance from an empirical quantile function F m (u) to the distribution 
F (related to the L 2 -Wasserstein squared distance) 


T m = C \F-\u)-F-\u)\ 2 du (8) 

Jo 

Analogous to Definition 13.11 we define 

Definition 3.2. G(x) is empirically L 2 —W-indistinguishable from F(x) if f ( ] |G _1 (tt)— F~ 1 (u)\ 2 du < 
medi?(T m ) for m = 10 6 . 


The asymptotic distribution of T m depends on F (unlike to the KS distance that was dis- 
cussed above). The asymptotic di strib uti on of T m is known for various F (see for example 


cussed above). me asymptotic di strip uti on or i m is Known lor various r (^see lor example 
del Barrio. Cuesta-Albertos. Matran. et al. ( 199 91)). However, it is a functional of a Brownian 


bridge and so for finite m we again determine the median from a simulation. 


6 lim m _ >+00 P(s/mDm <t) = H(t):= 1-2 (— 1)V 


-2 k 2 t 
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When the distribution F is the standard normal distribution 1V(0,1), then medi?(T m ) is approx¬ 
imately 3.5 x 10 -6 and when F is U[— 1,1], then it is approximately 4.7 x 10~'. Combining these 
two test statistics, we thus define the notion of empirical indistinguishability. 

Definition 3.3. G(x ) is empirically indistinguishable from F(x ) if it is both empirically L 2 — W- 
indistinguishable from F(x) and empirically KS-indistinguishable from F(x). 

3.2 Sum of two or more uniform distributions 

Perhaps surprisingly, there is a copula such that the sum of n > 2 uniform random variables is 
empirically indistinguishable from a normal distribution. For convenience, we choose expected 
values equal to 0 and X t are uniformly distributed over [—o,a], i.e. Xi ~ U\—a,a\,i = 1,2,...,n. 
We want to show there is a dependence structure such that the sum of a given number of uniform 
random variables on [—a, a] is close to Normal A7(0,1) distributed. In other words, we seek a copula 
for random variables X\, X 2 ,..., X n where Xi r*j U[-a,a],i = l,...,n and X n+ \ = — Z is J\f(0, 1) 
such that var(S') = var(Xi + X 2 + ... + X n+ \) is minimized. 


Block RA with U[—a,a\ to achieve a J\T( 0,1) 

1. Start with an initial value of a = 1.5. 

2. Run the block RA. Periodically, at each step 1 of the block RA, replace a by a constant 
chosen such that var(^” =1 Xf) = 1. 

3. Terminate the block RA when both var(5) and the value of a fail to change by a given 
tolerance. 


This algorithm permits finding a copula such that the sum of n uniform U[—a,a] is indistin¬ 
guishable from a normal random variable. We can change the target distribution to a variety of 
distributions and still get near equality up to a change in the location and scale of the target. 

Proposition 3.1. For each n > 2, there exist a copula and a value of a > 0 such that the sum of 
n dependent U[—a, a] with that copula is empirically indistinguishable from the J\f (0,1). 


This theoretical result is not surprising. Ruodu Wang pointed out to us that all unimodal- 
symmetric distributions supported in [—2a, 2a] can be represented as the distribution of the sum of 
2 variables uniformly distributed on [—a, a]. The existence of such depe ndence structu re betwe en 
two uniform variables can be proved using arguments of joint mixability ( Wang and Wang f 20ld ll. 
The proof of this proposition requires only that we choose a large enough that the KS and the 
L 2 — Wasserstein differences between the standard normal distribution and the normal distribution 
constrained to lie in the interval [—2a, 2a] is small, say less than 10~ 6 , and then represent this 
conditional normal distribution with the sum of two random variables uniformly distributed on 
[—a, a]. Our approach makes it possible to construct the explicit dependence between the two 
uniform variables numerically and identify a suitable value of a. 
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We will verify it by a numerical evaluation of the integrals in KS distance © and L 2 -W distance 
([8]) with 50,000 steps. Table [4] confirms the result for n = 2,3,4. The critical value are respectively 
given by med(D m ) = 8.2 x 10~ 4 and med(T m ) = 3.5 x 10 -6 


Table 4: Sums of Uniform U[—a,a\ and target cdf is a normal jV"(0,1). We report the values of 
the KS distance in the second column and the L 2 — W distance in the third column. The L 2 — W 
distance is the variance of X\ + X 2 + ... + X n — Z where Z has cdf 0,1) and Xj U[-a,a]. 


n 

KS distance 

L 2 — W distance 

a 

2 

5.5 x 10” 5 

9.3 x 10“ 7 

2.08 

3 

3.2 x 10~ 5 

4.9 x 10 -7 

1.38 

4 

2.2 x 10“ 5 

3.5 x 10~ 10 

1.31 


We used a numerical evaluation of the distances m and ([H]) using a grid of 50,000 points. 
For any n > 4, if n is odd, we can build the copula for the first three columns and then add 
countermonotonic pairs of uniform random variables Xi = Ui , W +1 = —Ui etc., where U are 
U[—a,a] and independent of the first three columns. Similarly, we treat the case n > 4 when n is 
even. Indeed we obtain Kolmogorov Smirnov distances well within the above-mentioned bound of 
0.00082. Moreover, the observed value of T m is again well within the 50% confidence interval based 
on the statistic T m . □ 

The joint density of this copula is obtained in Panel A of Figure [2] using a nonparametric density 
estimator for 10,000 data values. In general, the standard RA leads to a bivariate density that is 
significantly less smooth than the one obtained by the Block RA. 


3.3 Sum of two or more normal distributions 

If we reverse the roles of these two distributions, we can show that the sum of two dependent 
W(0,a 2 ) random variables is KS-empirically indistinguishable from a li[— 1,1] random variable 
with joint density of the copula displayed in Panel B of Figure [2] below. In this case the target 
distribution is the IA\— 1,1] and the asymptotic results for the L 2 -Wasserstein test are complex so 
we obtained the critical value medp(T m ) ~ 4.7 x 10~' from simulations with m = 10 6 . 

Proposition 3.2. For each n > 2, there exist a copula and a value of a 2 >0 such that the sum of 
n dependent J\f(0, a 2 ) with that copula is empirically indistinguishable from U[— 1,1]. 

Table O hereafter provides the result for n = 2, 3 and 4. However, for any n > 3, we can build the 
copula for the first two or three columns and then add countermonotonic pairs of random variables 
X 4 = Z 4 , A 5 = — Z 4 etc. where Zj are independent J\f(0,a 2 ) independent of the first two or three 
columns. Once again we use a numerical evaluation of the distances (0 and ([HI) using a grid of 
50,000 points. This verifies that such a copula exists for any n ^ 2. 


16 







joint density of the first two columns 


joint density of the first two columns 


0.35 ~r 



Panel A Panel B 


Figure 2: Panel A: Joint density of two U\— 2.056, 2.056] random variables whose sum is indis¬ 
tinguishable from jV(0,1). Panel B: Joint density of the two marginally normal random variables 
A7(0, 0.3363 2 ) whose sum is indistinguishable from U\— 1,1]. 


Table 5: Sums of Normal A7(0, a 2 ) and target cdf is a Uniform over [—1,1], We report the values 
of the KS distance in the second column and the L 2 — W distance in the third column. This is the 
variance of X\ + X 2 + ... + X n — Z where Z has cdf U[— 1,1] and X{ ~ A7(0, cr 2 ). 
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KS distance 

L 2 — W distance 

a 

2 

4.8x 10 -5 

2.3xl0 _lu 

0.3363 

3 

2.7xl0 -5 

4xl0” n 

0.4 

4 

1.9x 10 -5 

7xl0~ 12 

0.45 


3.4 Application in Finance 


The above examples using normal and uniform variables may suggest that the methodology has 
limited practical implications. This is not correct as the methodology can be very useful in finance 
to infer the dependence among assets in the risk-neutral world (i.e., using option prices as sole 
available information). We briefly outline the methodology and show how it can be used to choose 
a pricing model for spread options that is consistent with option prices on each asset and on the 
spread (difference). It is particularly interesting in fitting the bivariate distribution between the 
gas price and the electricity price given that spark spread options are actively traded. Spark spread 
options are options on the spread between natural gas and electric power as St = Pt~ hGt where 
Pt and Ci denote futures prices of power and gas, and h is th e heat rate or efficiency ratio of a 
typical gas fired power plant (see Alexander and Scourse ( 20041 ) . Carmona and Durrleman ( 2003 1. 
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Rosenberg ( 2000 )). Let T be the maturity of all options under consideration. From option prices on 
an asset with a large number of strikes it is possible to infer t he marginal distrib ution of this asset 
( Ait-Sahalia and La ( 200(1 ) . Breeden and Litzenberger ( 1978l h Bondarenkol ( 200,‘lh h Assuming that 
prices of spread options are available at the same time as options written on gas and electricity, 
it is possible to infer the marginal distributions of gas and electricity returns and of the spread. 
Our methodology can then be used to infer the dependence between gas and electricity returns as 
follows 


1. Use options on gas prices, on electricity prices and on the spark spread to derive the distri¬ 
b ution function Fp of Pr- F g of hGr, Fg of Sr respectively (e .g ., following the met hodology 
of Ait-Sahalia and Lo ( 2000h . Breeden and Litzenberger 1 1978h . Bondarenko ( 20031 )1 . 


2. For a given maturity, apply the block RA on a matrix with 3 columns and m rows (where m 
is the number of discretization steps). Each column contains a discretized distribution: 


In the first column 


In the second column 


In the third column 


F, 


—F, 


-l 


-l 

G 


771+1 


771+1 


—F, 


-l 


1 = 1,2, ..., 771 


1 = 1,2, ..., 771 


1 = 1,2, ..., 771 


m + 1 

• Apply the Block RA on the full matrix 

Output: Extract the first two columns to describe a discrete copula that is consistent with 
the information on the marginal distributions of Pp, Gp , and of the spread Sp. 


By repeating the above experiments sufficiently many times, one can describe models that are 
consistent with the information of the margins Fp, Fq and Fg. 


4 Alternative approach: MCMC Block RA 


As mentioned earlier, the problem of converging to the minimum of a convex function of the sum 
is NP-complete. It is thus not possible to find a deterministic algorithm that converges to the 
global minimum in polynomial time. We end the paper with an alternative direction that relies 
on a stochastic algorithm to achieve the convergence to the global minimum asymptotically in 
polynomial time (Theorem l4.il) . 

The RA and the Block RA converge to a possible solution for the global minimum of the ex¬ 
pectation of a convex function of the sum (Proposition ll.ll) . However, when there are more than 
3 variables involved, the dependence structure that achieves the global minimum variance does 
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not necessarily minimize other convex functions of the row sums. In this section, we develop a 
stochastic algorithm that is able to identify the global minimum in finite time. 

Consider the matrix X = [Xj, X2,..., X n _i, X n ]. For a set of columns II, we denote by S.n = 
S.n(X) the vector of sums 



and by X.n the submatrix X^. i = l,...,m, j 6 II. Assume, without loss of generality, that 
the column sums of X are all zero. For simplicity, consider the partition II = {1,2and 
fl = {k + 1,..., n}. We consider operations, which rearrange the rows in II while keeping those in II 
unchanged. The mean of the vector S.n + S.q is unchanged. For a positive convex function /(s) 
of these row sums, 

/(S. n + S. n )>/(S. n + S.“ n ) 

where S“jj consists of the same components as S.q but arranged to be countermonotonic to S.n- 
An operation, which rearranges the rows in II countermonotonically, while keeping those in II 
unchanged results in a reduction in a convex loss function. This choice is the basis of the block 
RA. 


We now design a stochastic algorithm to determine local and global minima of /(S..(X)), where 
S..(X) denotes the m sums over all columns. In particular, we define 


£(X) 


1 


and construct a Markov Chain designed to find the maxima of £(X). Any other distribution £(X) 
whose probabilities are decreasing functions of /(S..(X)) such as exp(—T/(S..(X))) for some 
T > 0 would also suffice. We choose a random partition II uniform over the 2 n_1 — 1 possible 
partitions. We then propose a random rearrangement of the rows of X t ^, k £ n designed so that 
after the rearrangement, S.n, S.q will tend to be countermonotonic. We then “accept” the move to 
this new matrix X', say, with probability 


min 



Note that larger values of tend to lead to acceptance of the move, and smaller values tend to 
result in remaining at X. We arrange that for a given partition II the proposal depends only on the 
matrix X.n- If f(X) < 00 for all X, this algorithm results in a finite state ergodic Markov Chain, 
which converges to a stationary distribution with positive probability on all possible states of the 
chain so that states with a very small value of /(S..(X)) appear with higher frequency. 

We wish to select a random permutation s* of the rows of X.q, which depends only on S.n in 
such a way that, after rearrangment, S.n, S.n tend to be countermonotonic. To do so, we choose 
S.n to be ranked identically to independent observations from a location family of distributions 
Yi - S in where Y t g(y). We might choose g(y) to be normally distributed with mean 0 and 
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variance a 2 or any other location family of distributions. We used g(y ) following the Gumbel 
extreme value distributior0 

g(z)=re~ rz ex p(-e~ rz ). (9) 

When the scale parameter 1/r of this location family approaches 0, this ranking approaches a coun¬ 
termonotonic one and the algorithm approaches the Block RA. An illustration of the m densities 
from which we simulate independently is represented in Figure [SJ 



Figure 3: Example with m = 6 and the 6 location families from which we generate independent 
random numbers corresponding to the 6 observations of S^n, these values are marked with 

Algorithm: Repeat n S i m times, with initial matrix X 

1. Propose II. Determine S 9 n, and S^n- For independent random variables Yi, i = 1,2 ,...m 
drawn from @ generate a random permutation s*(n) by ranking the observations Y — 
5,n- We then reorder the rows of X.^ using this permutation to obtain a proposal matrix 
with Xb = X. s *(j=[) and leaving the columns in X.n unchanged. The distribution of the 
permutation s* depends only on X.n- 

2. Accept the proposed rearrangement of the rows of X.^ with probability proportional to 

min(l, otherwise do not rearrange. 

3. Record the states of the system X having small values for /(X) and their frequencies. 

We wish to identify the stationary distribution /i(X) of this Markov Chain. Suppose Y is a 
matrix identical to X on the columns II and with columns II a permutation of those of X, i.e. 
Y.jj = X. s .(fj). Then the transition probability matrix is defined by: 

P x.Y = 2ra -i_ 1 g(Y|X.n) min (l, ®) - 

7 A similar family of distributions g(z) = p^j e~ rz exp(— e~ z ) is obtained as the logarithm of a Gamma distributed 
random variable and provides a random permutation from the well-known Gamma ranking family dsternl (Il990i 'll. 
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Here, (/(Y|X.n) is the probability of proposing the permutation s*(n) based on the row sums 5,n- 
The equilibrium distribution /x(X) must satisfy 

J>(X)^c,y = /x(Y), and J>(X) = 1. (10) 

X X 


Although it may be difficult in general to solve this system of equations, provided 0 < t'(X) < oo, 
for all X, this is a finite state irreducible positive recurrent (ergodic) Markov Chain. Therefore the 
station ary d istribution is such that every state has positive probability (see Theorem, page 393, 
Feller ( 19571 )). Thus it guarantees that every state is visited in finite time, and that the expected 


time before the chain visits the global minimum is finite. If there is a matrix X such that £(X) = 0, 
then the chain is absorbed and the algorithm terminates at this optimum. 


This algorithm offers a compromise between rapid initial convergence and a guarantee that the 
global minimum variance will eventually be achieved. Depending on the choice of scale parameter, 
it offers a rapid convergence to a region in which the objective function is small, followed by 
fluctuations around the local minima of the function. Since every point X in the sample space of all 
possible column permutations is visited with frequency proportional to /r(X) we are guaranteed that 
the global minimum will be reached in a finite amount of time. Indeed the stationary probabilities 
/z(X) represent the reciprocals of the mean recurrence time to this state. 

Theorem 4.1. The above algorithm generates a Markov Chain on the state space of matrices 
(X n ) ng ^, which converges to its stationary distribution /i(X) (see (fTUj) ). The probability that the 
global optimum X m i n is not found after N simulations is o(q N ) for some q < 1. 


Proof. The proof is a consequence of well-known results concerning the convergence of a finite state 
ergodic Mar kov C hain. For the geometric rate of convergence to the stationary distribution, see 
for example Cinlar ( 19751 ). □ 


We run this algorithm using as starting matrix B\, the matrix discussed earlier, for which for all 7 
possible partitions n, fl we have that 5n, <Sn are countermonotonic so that 4> (X)ign X?;, X?:gn Xj) = 
— 1. This is a local optimum for the Block RA. The variance of the row sums is 0.04346. 


B i 


/ 0.0662 
0.3271 
0.6524 
\ 1.0826 


0.2571 

1.0061 

-0.6509 

-0.9444 


0 

-1.3218 

-0.0549 

0.9248 


-0.5821 \ 
-0.0833 
0.2495 
-0.9263 / 


( 11 ) 


Figure U illustrates the trajectory of the above algorithm for this initial matrix. Note that 
it successfully climbs out of local valleys and in less than 500 steps is able to find the matrix 
corresponding to the global minimum variance of 0. Of course the number of steps required to find 
this optimum is, in general, random but if we were to use enumeration, we would require evaluating 
the rows sums over a sample space of (4!) 3 = 13824 different matrices. 
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Figure 4: Trajectory of the above algorithm for the initial matrix B\ 


The preceding example is somewhat atypical of the performance of the algorithm because the 
minimum variance is 0 and eventually this Markov Chain is absorbed in this state. When the min¬ 
imum is strictly positive, the chain tends to fluctuate around its equilibrium distribution described 
by Theorem 14.11 above. 

For example, suppose we begin with the matrix X below, which was obtained by generating the 
first column as ordered Z7[0,1] variables and the second and third columns are random permutations 
of the first. 



/ 0.0074 

0.8657 

0.8574 \ 


0.2957 

0.2957 

0.3569 


0.3569 

0.6067 

0.6067 

X = 

0.4638 

0.8574 

0.4850 

0.4850 

0.0074 

0.2957 


0.6067 

0.4638 

0.8657 


0.8574 

0.4850 

0.4638 


\ 0.8657 

0.3569 

0.0074 

In this case, the minimizing matrix is 

/ 0.0074 

0.8657 

0.6067 


0.2957 

0.8574 

0.3569 


0.3569 

0.2957 

0.8574 



0.4638 

0.4638 

0.4850 


0.4850 

0.4850 

0.4638 


0.6067 

0.0074 

0.8657 


0.8574 

0.3569 

0.2957 


\ 0.8657 

0.6067 

0.0074 


with variance of the row sums equal to 0.0012. 
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The trajectory in Figured] clearly shows the fluctuations around a stationary distribution in the 
variance of the row sums over these 5,000 iterations. 
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Figure 5: Fluctuations around a stationary distribution 


5 Conclusions 


This paper proposes an improved rearrangement algorithm and a stopping rule. It can efficiently 
find the dependence structure that minimizes the variance of the sum of n dependent variables. It 
is thus able to infer the dependence between n — 1 variables such that the last variable is equal to 
the sum of the n — 1 first variables. As already discussed extensively in the introduction, this idea 
is useful in identifying the optimal structure to achieve the Value-at-Risk bounds with a variance 
constraint where t he aggregate risk that maximizes and mini mizes the Value-at-Risk is a two- 
point distribution ( Bernard. Riischendorf. and Vanduffell ( 201(1 )). This idea can also be exploited 
in finance to infer the joint distribution among assets for which prices of spread option or basket 
options are available. 
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